Skip to content

Comments

Use type checking to detect invalid mutants#468

Merged
Otto-AA merged 6 commits intomainfrom
type-checking
Feb 22, 2026
Merged

Use type checking to detect invalid mutants#468
Otto-AA merged 6 commits intomainfrom
type-checking

Conversation

@Otto-AA
Copy link
Collaborator

@Otto-AA Otto-AA commented Feb 14, 2026

Fixes #467

What works so far:

  • runs custom type checker on ./mutants/
  • parses errors from JSON output (pyright, pyrefly)
  • maps error lines to mutants
  • disables mutants

Big TODOs:

I think for the class properties problem, we would need to define the mutated methods outside of the class and dynamically either add them or overwrite the original method. (EDIT: this won't work with Self types, @classmethod, etc.)

To keep the original types of mutated methods, using libcst to copy the signature should be fine.

@Otto-AA Otto-AA force-pushed the type-checking branch 3 times, most recently from 2863520 to 00778dc Compare February 15, 2026 10:24
@Otto-AA
Copy link
Collaborator Author

Otto-AA commented Feb 15, 2026

At least for the small E2E test it works well with pyrefly: 🧙

image

I'll try to get it working with pyrefly for a medium-size project, and only later look if pyright and others can also be supported. I think for pyright we would need to relax some type checking rules.

And also need to debug why the tests fail in CI, it seems pyrefly outputs JSON + normal text in the CI but locally only JSON

@Otto-AA Otto-AA force-pushed the type-checking branch 2 times, most recently from 93a17ee to df6e6fd Compare February 16, 2026 19:43
@Otto-AA
Copy link
Collaborator Author

Otto-AA commented Feb 16, 2026

On my sample repo it worked for pyrefly and mypy, but not for pyright and ty.

The difference is that in the following example they infer following types for the self.x in c:

  • pyrefly: int
  • mypy: int
  • ty: Unknown | Literal[2, "a"]
  • pyright: int | str
from typing import reveal_type

class Foo:
    def __init__(self):
        self.x = 2

    def some_mutant(self):
        self.x = 'a'

    def c(self) -> int:
        reveal_type(self.x)
        return self.x

Thus, pyrefly and mypy infer self.x based on the first usage (and error in some_mutant that str is not assignable to int). Pyright and ty infer the union of all usages (and unknown for ty?), thus some_mutant does not error but changes the type of self.x which can break previously working types in other functions like c.

I can't think of a method to fix this currently, except for copying the whole class per mutant. I won't implement this for now, but likely will add this feature only for pyrefly and mypy for now.

Remaining TODOs:

  • unit test error parsing
  • revisit error handling/messages
  • hide filtered mutants in the browser TUI
  • sample some of the filtered mutants in my sample repo to check if it works as expected
  • wait for the next pyrefly release which fixes the pyrefly JSON output in github actions

@Otto-AA Otto-AA force-pushed the type-checking branch 2 times, most recently from 47f7dc7 to 3474f7e Compare February 22, 2026 15:28
@Otto-AA
Copy link
Collaborator Author

Otto-AA commented Feb 22, 2026

I found one case where this filter can remove interesting mutations:

class Foo:
    def __init__(self, a: int) -> None:
        self.a = a

mutates to:

class Foo:
    def __init__(self, a: int) -> None:
        args = [a]# type: ignore
        kwargs = {}# type: ignore
        return _mutmut_trampoline(object.__getattribute__(self, 'xǁFooǁ__init____mutmut_orig'), object.__getattribute__(self, 'xǁStorageByteGroupǁ__init____mutmut_mutants'), args, kwargs, self)
    def xǁFooǁ__init____mutmut_orig(self, a: int) -> None:
        self.a = a
    def xǁFooǁ__init____mutmut_1(self, a: int) -> None:
        self.a = None # <- this is now a type error, because mypy/pyrefly already inferred 'int' on the first self.a = a definition

This should be a valid mutation, however it is marked as a type error. I can't think of a way to fix this, except for copying the whole class per mutation (similar to the pyright/ty problem described above).

And in general, I've noticed that even if a mutation has a type error, there are cases where it would show a lack of tests.

I've added these notes to the README. I think it can still be useful if there are too many survived mutants and you want to reduce the noise and/or improve performance. But it's not perfect, just a tradeoff.

@Otto-AA Otto-AA marked this pull request as ready for review February 22, 2026 17:08
@Otto-AA Otto-AA merged commit f377984 into main Feb 22, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Type hints/checking to reduce mutants

1 participant